Systems and Methods for Logic Verification

ABSTRACT

Methods and systems for simulating logic may translate logic design into executable code for a multi-processor based parallel logic simulation device. A system may implement one or more parallel execution methods, which may include IPMD, MPMD, and/or DDMT.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a divisional application of U.S. patent applicationSer. No. 11/937,577, filed Nov. 9, 2007, which claims the priority ofU.S. Provisional Patent Application No. 60/866,517, filed on Nov. 20,2006, both of which are incorporated herein by reference.

FIELD OF ENDEAVOR

Embodiments of the invention are may address multi-core chiparchitectures that may be used for logic verification and associatedmethods for using such architectures.

BACKGROUND OF THE INVENTION

Existing logic verification technology is mostly based on the use offield-programmable gate arrays (FPGAs), a cluster of computers (e.g.,PCs), or specially designed application-specific integrated circuit(ASIC) systems.

Current FPGA-based technologies usually try to directly map the targetlogic into a group of FPGAs and to emulate the target system. Thisapproach is not scalable and becomes extremely expensive as thecomplexity of the target logic increases. Also, the synthesizingprocesses normally takes a long time, which makes this approach veryinefficient at the early stages of the chip logic development whendesign changes occur very often. Furthermore, FPGAs are intrinsicallymuch slower than custom designed circuits.

The biggest problem of simulating complex chip logic on a PC cluster isthe low performance. The main hindering factors come from instructionand data cache locality that are not well-suited to this type ofsimulation, inefficient communication channels, and operating systemoverhead.

Some companies have developed dedicated logic simulation machines withspecially designed ASICs to accelerate the logic simulation process.Those systems are usually extremely expensive to develop and upgrade,and tend to be less flexible than other types of systems. The existingmachines are generally not commercially available to outside users.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention will now be described inconjunction with the attached drawings, in which:

FIG. 1 shows a conceptual diagram of various aspects of variousembodiments of the invention;

FIG. 2 shows a conceptual block diagram of a representation synthesizeraccording to an embodiment of the invention;

FIG. 3 shows a conceptual block diagram of a particular logic simulationtype that may be implemented in some embodiments of the invention;

FIG. 4 shows a conceptual block diagram of a particular logic simulationtype that may be implemented in some embodiments of the invention;

FIG. 5 shows a conceptual block diagram of a particular logic simulationtype that may be implemented in some embodiments of the invention;

FIG. 6 shows a conceptual block diagram of a system architecture toimplement a particular logic simulation type according to an embodimentof the invention;

FIG. 7 shows a conceptual block diagram of a system architecture toimplement a particular logic simulation type according to an embodimentof the invention;

FIG. 8 shows a conceptual block diagram of a system architecture toimplement a particular logic simulation type according to an embodimentof the invention; and

FIG. 9 shows a conceptual block diagram of an exemplary system in whichat least some portions of embodiments of the invention may beimplemented, and/or which may be used along with various embodiments ofthe invention.

DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

FIG. 1 shows a conceptual block/flow diagram that may be used todescribe various embodiments of the invention. Embodiments of theinvention may include a logic verification core (LVC) chip 11. LVC chipmay include a number of LVCs 112, each of which may have an associatedmemory M, and which may be interconnected by means of a network 111. Theassociated memory M may be an individual memory component for each LVC112, or the associated memory may comprise a portion of a larger memorycomponent that may be shared among multiple LVCs 112.

An LVC 112 may comprise a logic verification core processor 1131 (whichmay be referred to below as “LP”), and may include local data memory tohold various associated components 1132, such as input, output, etc. Thelogic verification core processor may also include local instructionmemory 1133 for the LVC to access for execution.

Under traditional event-driven simulation (e.g., CSim), events may begenerated when logic cells (netlist design) or signal variables (RTLdesign) change their values. These events may be stored in an eventqueue and eventually consumed by the simulation engine to updateaffected logic cells (netlist design) or RTL processes (RTL design).

In contrast, in embodiments of the invention, the input logic design maybe translated into a program composed by a set of primitive logicoperations, which may be arranged in such a way that the dependenciesbetween the operations in the original input are satisfied. This may bebased, at least in part, on the principle that, no matter how complex alogic circuit, it may be mapped to a group of primitive logicoperations, such as AND, OR, MUX, etc.

FIG. 1 further shows conceptually how an input logic design may betranslated into code that may be used in the LVC chip 11. This processmay be conceptually thought of as an LVC compiler, and the process asLVC compilation. Embodiments of the LVC compiler may be designed tostructure the logic such that all state elements can be captured andrepresented by state elements that separate the execution of the logicitself to be combinational as shown in FIG. 1. A logic design 12, whichmay be written in a hardware description language (HDL), such as Verilogor VHDL, may be fed to an LVC synthesizer 13. The LVC synthesizer 13 mayoutput an LVC intermediate representation (LVC IR) 14.

Note that the LVC synthesizer 13 may be designed such that LVC IR 14 maybe able to represent both the functional/applicative subset of thetranslated logic program and the associated non-functional/imperativeparts. Optimizations may then be applied to increase simulation speed,reduce resource usage, and make trade-offs between these two, whilegenerating the final logic programs that are to be mapped on the LVCs112. This may be accomplished by LVC code generator 15, whose output maythen be provided to an LVC chip 11.

In embodiments of the invention, a logic simulation may be converted forexecution of the logic programs on logic processors. The LVC compiler(13-15) may be used to bridge the gap between target logic design source12 and the LVC simulation hardware. The LVC compiling process may bedivided into two stages: the “front end” handled by the LVC synthesizer13 and the “back end” handled by the LVC code generator 15. The targetlogic design 12 may be written in any hardware description language(HDL) (Verilog, VHDL, etc.) and any code style (RTL or netlist). At thefirst stage, an LVC synthesizer 13, an embodiment of which is shown infurther detail in FIG. 2, may be used to translate original logic design12 (which may be expressed in an HDL program) into LVC IR 14, which,according to some embodiments of the invention, may be likened to a“dataflow” program graph, where nodes may be used to represent logicoperations and arcs may be used to represent data dependencies. In thesecond stage, LVC code generator 15 may then compile the LCV IR intomachine level executable code that may be able to be run on the LVC chip11 by the LVC cores 112. This LVC program may preserve the datadependences of the original LVC IR 14, while the predefined logic cellsin the LVC IR 14 may be simulated with a set of simple, fixed-width, andpipelined LVC instructions.

As shown in FIG. 1, the LVC chip 11 may have many LVC cores 112inter-connected by a network 111, which may be a fast crossbar network,for example. Different types of target logic designs may be partitioned,translated, and loaded with a LVC tool chain. Inputs to this program mayinclude the input signals and original state bits, mapped to the memorycells. One iteration of the program execution on the logic processor maygenerate the output signals and the new state bits for the nextiteration, which may then be mapped to the memory cells. A simulation ofa logic design may include repetitive executions of the same program fora number of such iterations.

As shown in FIG. 2, the LVC synthesizer 13 may be used to translate thetarget logic design 12 into LVC IR 14. The coding style of the targetlogic design 12 is not restricted to RTL or Netlist. The translationprocess of LVC synthesizer 13 may be composed of two phases. In thefirst phase, the LVC synthesizer 13 may translate the source design 12into a standard electronic design interchange format (EDIF) netlistformat, using a parser 21. The EDIF netlist may still preserve ahierarchical structure. An LVC standard cell library 22 may be definedfor a synthesizer 23 to generate the LVC IR output. In the second phase,the EDIF netlist may be translated by synthesizer 23 into LVC IR 14 inwhich the logic design may be viewed as having been “flattened,” and thenetlist may look more like a dependence graph composed of primitivelogic cells. The LVC synthesizer 13 may generally not perform any kindof optimizations upon the design 12, to preserve the original logicstructure for debugging purposes.

The LVC IR 14 may be thought of as a netlist composed of predefinedprimitive logic cells. The following is an example of an LVC IR 14:

Block ICache 1 ZDIMES_DBG_FRP0_A WIRE Inputs I:1:W28 Outputs W28 Width28 2 ZDIMES_DBG_FRP1_A WIRE Inputs I:4:W28 Outputs W28 Width 28  ... 307MC.DG.ZGROUP_EDGE.ZQ6 AND Inputs C:285:W1 C:306:W1 Outputs W1 Width 1308 MC.DG.ZGROUP_EDGE.ZQ7_1_1 CONST Inputs K:7 Outputs W3 Width 3 ... 9952 ZX_TOP_HAVE GLUE Inputs C:7773:W1 C:7772:W1 C:7771:W1 C:7770:W1C:7769:W1 C:7768:W1 C:7767:W1 C:7766:W1 C:7765:W1 C:7764:W1 Outputs W10Width 10 Inputs 1 FRP0_A[27:0] 2 FRP0_HAVE  ...  39 CLK Outputs 1TOP_D_0[511:0] C:9833:1 2 TOP_D_1[511:0] C:9834:1  ... 30 MTB[3:0]C:870:1

This example is a LVC IR 14 that may represent a hypotheticalinstruction cache unit. The block in this example has 9952 nodes, eachone of which may correspond to a primitive logic cell. Every node may berepresented with one line of the statement that may include statementID, statement name, logic operation type, input and output information,and width (bits). The input information may define the type of theincoming source, which may be any one of three sources: module input,constant, or output of other node. At the end of the LVC IR 14definition, the module inputs and outputs may be defined. For the moduleoutputs, the sources of the outputs may be specified with a statement IDthat may be associated with each one of the outputs. Those statementsmay correspond to the nodes that have their outputs directly connectedto the module outputs. Those primitive logic cells may handle signalswith variable length. The LVC logic processors 112 may, in someembodiments, comprise fixed 8-bit processing units. Hence, this is whyone may need the LVC code generator to translate the primitive logiccells in the LVC IR 14 into a set of even more primitive fixed-width LVCinstructions that may be executed by fixed-width logic processors.

Aspects of embodiments of the invention on LVC code generation mayfeature a new method for register allocation and instruction schedulingthat departs from the traditional implementation in normal optimizingcompilers for general purpose microprocessors. In logic verificationsimulations, there may simply be too many variables for the classicalregister allocation algorithm to work effectively. Heuristic approachesmay be developed to reducing the compilation time without a significantincrease in the demand for storage resources.

The LVC code generator 15 is the “back end” of the LVC logic compiler.It may translate the LVC IR 14 into the LVC executables that may beexecuted by multiple LVC logic processors 112. The LVC code generator 15may generally be aware of the architectural features of the LVC logicprocessors 112. Those features may include the on-chip data memory sizefor each execution engine, the on-chip instruction memory size, and soon. LVC code generator 15 may try to schedule the logic instructions ofthe logic program so that the temporary storage needed during executioncan fit into the on-chip memory of the LVC chip 112. The LVC codegenerator 15 may also generate debugging information at the same timefor signal tracing support.

From the compiler's point of view, the LVC IR 14 may be thought of as a“basic block” composed by logic instructions (or nodes). These logicinstructions may generally belong to either of two categories:combinatorial and sequential. The majority of the gates, such as AND,OR, DECODE, and so on, may be combinatorial, and signals may propagatethrough them in a certain order. The rest of the logic nodes in the LVCIR 14 may be registers (or other sequential instructions). They mayretain their values during a simulation cycle until, for example, thenext rising edge of the simulated clock, when they may be updated withnew values. Given this observation, the LVC IR 14 may also be thought ofas a directed acyclic graph (DAG), and the logic instructions may bescheduled to maintain the dependences the DAG imposes.

For example, data storage for the register class of instructions mayneed to be specially treated with double buffering, one for an old valueand one for a new value. The register buffer updating may generally takeplace between two simulations cycles. Finally, a separate storage spacemay be allocated for the inputs and the outputs of the “basic block”, sothat their values can be used to check the simulation result or tocommunicate with other simulated modules.

The LVC cores 112 may be implemented by simple stack processors. The useof a particular instruction set architecture (ISA) for the LVC cores 112may be quite simple in that it may employ a simplified instruction set,compared to modern reduced instruction set computer (RISC) cores. Forexample, it may not be necessary to include operations on many datatypes (e.g., float types), nor many addressing modes. It may besupported by a very large instruction word (VLIW) structure that may beexploited by the LVC code generator 15 for multiple logic instructionissues.

The LVC chip architecture 11 may support three execution models: (1)IPMD; (2) MPMD model; (3) DDMT model. These will be discussed furtherbelow. The LVC compiler may be directed, e.g., by a user, by a settingin the logic design code, or some other means, to generate LVC code forone of these execution models.

Under an Identical Program Multiple Data (IPMD) execution model onesingle copy of the program may be shared by all the LVC cores 112, andall LVC cores 112 may execute the program independently. This model mayparticularly suitable to simulate an array of identical logic circuitsand may be well-suited to simulate multiple cores in a multi-core chip.The repetitive functional units within a multi-core chip may benaturally mapped onto a group of LVC cores 112 that share the sametarget logic program.

Under a Multiple Program Multiple Data (MPMD) execution model, each LVCcore 112 may execute its own copy of a program independently. Theexecution of the LVC cores 112 may be loosely-synchronized: thesynchronization may be performed at properly placed barriersynchronizing points. At those synchronization points, interface signalsmay be exchanged between LVCs 112 to start the next simulation cycle.

Under a Data-Driven Multithreaded (DDMT) execution model, each LVC mayexecute its own program. The execution of the sections of the programmay be driven by “events”, which may correspond to data changes at theoutputs of the primitive logic cells.

At the LVC chip level 11, embodiments of the invention may employ amulti-core architecture, which may use a shared memory organization,with or without relying on data caches. The explicit memory hierarchymay be exploited by the LVC code generator 15 to ensure that a localmemory module of each core is best utilized by exploiting the localityin the LVC IR 14, by means of code partitioning.

As noted above, there may be three execution models (IPMD, MPMD, andDDMT) that may be chosen for simulation. The multi-core architecture ofthe LVC chip 11 may be adapted to accommodate these three executionmodels, as well be discussed in further detail below.

IPMD may be well-suited to simulate target logic with many repetitivelogic modules. As shown in FIG. 3, in the IPMD mode, the LVC chip 11 mayhave only one instruction stream, and all logic processors (LVCs 112)may be executing exactly the same logic instructions. Because theselogic processors may have their own LP RAMs and may be assigned separatememory spaces in off-chip DRAM, they may simulate functionally identicalbut physically separate logic elements in parallel while sharing thesame logic instruction stream.

The LVC chip 11 may also be configured to let different logic processorsexecute different logic programs. This may be useful when the targetlogic is partitioned in such a way that not all sub-modules areidentical. Even though the instruction sequencer in the LVC chip 11 maybe able to support generating multiple instruction streams, the numberof the instruction streams may be limited by the number of read ports ofthe internal instruction RAM. Therefore, the instruction RAM may bedesigned to be a set of smaller size dual-port RAM blocks, as shown inFIG. 4. Just like the internal RAM blocks of common FPGAs, theconnection of these smaller size RAM blocks may be configured. They maybe configured to compose a single RAM block, which may be used in IPMDmode (as shown in FIG. 3), or they may be configured to be a number ofindependent small RAM blocks to provide multiple instruction streams, asshown in FIG. 4.

In the DDMT mode, the LP 1131 may execute the logic instructionsgenerated from a node in the LVC IR 14 only when any of its inputs haschanged. In a provisional study, using a simple RISC processor logic asan example, it was discovered that, on average, fewer than 10% of thegates in the processor's logic actually produced different outputs everycycle. Given this, the DDMT mode may be able to save a lot ofunnecessary execution time during simulation, and the simulationperformance may be able to be significantly improved. As shown in FIG.5, the LP 1131 may be extended with a conditional enabling instructionthat may enable the nodes of the LVC IR 14, which may use the output ofa node currently being executed, if the output is changed. LP RAM may belarge enough to hold two copies (old and new values) of the outputs ofall KSF nodes. In a real system, LP RAM may be implemented with a datacache. No coherent protocol may be needed among multiple LP RAMs becausethey may typically not share any data. Every LP 1131 may typically haveits own execution flow. Therefore, as shown in FIG. 5, a separateinstruction RAM may be connected to each of the LPs 1131. Thoseinstruction RAMs may be instruction caches whose data may comes fromoff-chip RAM. It is expected that the Enabled Queue may have a limitedsize so that it can fit in the on-chip RAM. If the Enable Queueoverflows, an error flag may be set. However, statistically, EnableQueue overflow is not expected to happen very often.

Some of the functional blocks of the LVC chip 11, configured as an IPMDchip, are shown in FIG. 6, according to an embodiment of the invention.A host interface 61 may be used to enable a user to load instructionsinto the instruction memory 62 and to load data into LPs and thesimulation control unit 63. All LPs may share the same instruction flowin the IPMD chip, and all off-chip memory accesses may be handled by theremote memory access server (RMS) 63. DDR2 controller 64 and SRAMcontroller 65 may be used to provide interfaces to off-chip DRAM andSRAM modules.

FIG. 7 shows a corresponding functional block diagram of an LVC chip 11configured as an MPMD chip, according to an embodiment of the invention.As shown in FIG. 7, embodiments of the MPMD chip may differ fromembodiments of IPMD chip in that every LP 1131 may have its owninstruction memory block 72. Every LP 1131 may thus have independentinstruction flow in parallel during a simulation.

FIG. 8 shows a similar functional block diagram of an LVC chip 11configured as a DDMT mode chip, according to an embodiment of theinvention. As shown in FIG. 8, every LP 1131 may have its owninstruction and data memory caches, 72 and 86, respectively. The caches86 may hold data from off-chip DRAM and SRAM and may be used to exploitlocality in both the simulation program and the simulation data.

Various embodiments of the invention may comprise hardware, software,and/or firmware. FIG. 9 shows an exemplary system that may be used toimplement various forms and/or portions of embodiments of the invention.Such a computing system may include one or more processors 92, which maybe coupled to one or more system memories 91. Such system memory 91 mayinclude, for example, RAM, ROM, or other such machine-readable media,and system memory 91 may be used to incorporate, for example, a basicI/O system (BIOS), operating system, instructions for execution byprocessor 92, etc. The system may also include further memory 93, suchas additional RAM, ROM, hard disk drives, or other processor-readablemedia. Processor 92 may also be coupled to at least one input/output(I/O) interface 94. I/O interface 94 may include one or more userinterfaces, as well as readers for various types of storage media and/orconnections to one or more communication networks (e.g., communicationinterfaces and/or modems), from which, for example, software code may beobtained. Such a computing system may, for example, be used as aplatform on which to run translation software and/or to control, house,or interface with an emulation system. Furthermore, other devices/media,such as FPGAs, may also be attached to and interact with the systemshown in FIG. 9.

Various embodiments of the invention have now been discussed in detail;however, the invention should not be understood as being limited tothese embodiments. It should also be appreciated that variousmodifications, adaptations, and alternative embodiments thereof may bemade within the scope and spirit of the present invention.

1. A logic simulation integrated circuit comprising: a multiplicity offixed-instruction-width core processors; a multiplicity of local memoryblocks, each local memory block associated with one of said coreprocessors; and instruction memory coupled to said core processors,wherein said core processors are to execute instructions in parallel,and wherein said instruction memory is to provide at least one logicinstruction to one or more of said core processors, in parallel.
 2. Theintegrated circuit according to claim 1, wherein said local memoryblocks comprise a multiplicity of separate local memory blocks.
 3. Theintegrated circuit according to claim 2, wherein each separate localmemory block comprises a data cache to hold data from off-chip memory.4. The integrated circuit according to claim 1, wherein said instructionmemory comprises a multiplicity of parallel instruction memories, eachto provide one or more instructions to one of said core processors. 5.The integrated circuit according to claim 1, wherein said integratedcircuit is tailored to implement an execution model selected from thegroup consisting of: identical program multiple data (IPMD), multipleprogram multiple data (MPMD), and data-driven multi-threaded (DDMT). 6.The integrated circuit according to claim 1, further comprising: a hostinterface coupled to said core processors and to said instructionmemory; and a control module coupled to said processors and to said hostinterface.
 7. The integrated circuit according to claim 6, furthercomprising: one or more memory access controllers coupled to saidcontrol module and to one or more off-chip memory components.
 8. Theintegrated circuit according to claim 7, wherein the integrated circuitis to implement an IPMD execution model, wherein the instruction memoryis to provide a common instruction in parallel to each of the coreprocessors, and wherein each of said core processors is associated withseparate memory space in at least one off-chip memory component.
 9. Theintegrated circuit according to claim 7, wherein the integrated circuitis to implement an MPMD execution model, wherein the instruction memoryis to provide multiple instructions, in parallel, to the multiplicity ofcore processors, wherein at least two of the multiple instructions aredifferent from each other.
 10. The integrated circuit according to claim7, wherein the integrated circuit is to implement a DDMT memory, whereinthe instruction memory is to provide multiple instructions, in parallel,to the multiplicity of core processors, wherein at least two of themultiple instructions are different from each other, and wherein theinstruction memory comprises multiple parallel instruction memories,each coupled to one of the respective core processors.
 11. Theintegrated circuit according to claim 10, further comprising amultiplicity of enable queues, each associated with and coupled to arespective one of said core processors.
 12. A simulation systemcomprising: a host computer system comprising at least one processor anda machine-readable medium containing software code that, when executedby the at least one processor, causes the at least one processor toimplement a method of preparing code for execution by a multi-core logicsimulation system, the method comprising: translating a target logicdesign from a high-level logic design language to an intermediate formcomprising code lines, each code line including at least one logicoperation and one or more data dependencies with respect to one or moreother operations in the code lines; and translating the intermediatecode into fixed-width instructions to be executed by core processors ofsaid multi-core logic simulation system; and a logic simulationintegrated-circuit device coupled to said host computer system andcomprising: a multiplicity of fixed-instruction-width core processors; amultiplicity of local memory blocks, each local memory block associatedwith one of said core processors; and instruction memory coupled to saidcore processors, wherein said core processors are to executeinstructions in parallel, and wherein said instruction memory is toprovide at least one logic instruction to one or more of said coreprocessors, in parallel.